Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)
Under maintenancePricing
Pay per usage
Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)
Under maintenanceGiven a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Hojun Lee
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Sitemap URL Discovery
Given a domain, finds sitemap.xml + sitemap_index.xml (also via robots.txt), recursively expands nested sitemaps, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.01 site fee + $0.0001/URL.
Why this exists
Before you scrape, audit, or index a site, you need to know what's there. The site's own sitemap is the authoritative list — but discovering it requires:
- Checking common paths (sitemap.xml, sitemap_index.xml, wp-sitemap.xml)
- Parsing robots.txt for
Sitemap:directives - Recursively walking sitemap-index → child sitemaps
- Parsing each one for
<url>records
This actor does all of it with sane fallbacks. Returns a summary + one row per discovered URL.
What you get
Summary row
{"_type": "summary","site_url": "https://www.apify.com","sitemaps_scanned": 5,"sitemap_urls": ["https://www.apify.com/sitemap.xml","https://www.apify.com/sitemap-index.xml","https://www.apify.com/sitemap/actors1.xml",...],"urls_discovered": 12384}
Per-URL row
{"_type": "url","url": "https://www.apify.com/store/actors/...","lastmod": "2026-06-08","changefreq": "weekly","priority": "0.7"}
Quick start
Discover all URLs on a domain
{"siteUrl": "https://www.apify.com"}
Only product / actor pages
{"siteUrl": "https://www.apify.com","pathContains": "/store/actors/","maxUrls": 5000}
Cap scan size for huge sites
{"siteUrl": "https://en.wikipedia.org","maxUrls": 100000,"maxSitemapFiles": 50}
Pricing
Pay-Per-Event:
$0.01— flat fee per site (covers initial discovery)$0.0001— per URL row returned
| Run | URLs | Cost |
|---|---|---|
| Small SaaS site | 200 | $0.03 |
| Mid-sized blog | 5,000 | $0.51 |
| Mega site | 100,000 | $10.01 |
Vs Screaming Frog SEO Spider ($259/yr), Sitebulb ($175/yr) for one-off audits.
Use cases
- SEO audit — Pull every URL with its
lastmod; find stale content - Crawl planning — Feed URLs into Web → Markdown or your own scraper
- Content monitoring — Detect new URLs by comparing snapshots over time
- Competitor research — See what a competitor's catalog looks like
- Sitemap sanity check — Verify sitemap-index works; catch broken nested sitemaps
Limitations
- No HTML scraping fallback — If a site has no sitemap (rare for serious sites), this returns 0 URLs. For HTML-link-crawling, use a crawl-specific actor.
- Doesn't honor noindex — A URL in sitemap might still be
noindexin HTML; this actor returns what's in sitemap.
Related actors (same author)
- Web Page → Markdown Converter — Convert discovered URLs to text
- HTML Metadata Extractor — Pull meta tags from each URL
- PDF Text Extractor
- JSON Schema Generator
Feedback
A short review helps SEO engineers find it: Leave a review on Apify Store